anchor pair
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential (reasoning-based) solutions, which capture the underlying compositional primitives, or symmetric (memory-based) solutions, which simply memorize mappings without understanding the compositional structure. By analyzing the information flow and vector representations within the model, we reveal the distinct mechanisms underlying these solution types. We further find that inferential (reasoning-based) solutions exhibit low complexity bias, which we hypothesize is a key factor enabling them to learn individual mappings for single anchors.
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
Zhang, Zhongwang, Lin, Pengxiao, Wang, Zhiwei, Zhang, Yaoyu, Xu, Zhi-Qin John
Transformers have demonstrated impressive capabilities across various tasks, yet their performance on compositional problems remains a subject of debate. In this study, we investigate the internal mechanisms underlying Transformers' behavior in compositional tasks. We find that complexity control strategies significantly influence whether the model learns primitive-level rules that generalize out-of-distribution (reasoning-based solutions) or relies solely on memorized mappings (memory-based solutions). By applying masking strategies to the model's information circuits and employing multiple complexity metrics, we reveal distinct internal working mechanisms associated with different solution types. Further analysis reveals that reasoning-based solutions exhibit a lower complexity bias, which aligns with the well-studied neuron condensation phenomenon. This lower complexity bias is hypothesized to be the key factor enabling these solutions to learn reasoning rules. We validate these conclusions across multiple real-world datasets, including image generation and natural language processing tasks, confirming the broad applicability of our findings.
- Asia > China > Shanghai > Shanghai (0.05)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing
Zhang, Zhongwang, Lin, Pengxiao, Wang, Zhiwei, Zhang, Yaoyu, Xu, Zhi-Qin John
Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solutions, which capture the underlying compositional primitives, or symmetric solutions, which simply memorize mappings without understanding the compositional structure. By analyzing the information flow and vector representations within the model, we reveal the distinct mechanisms underlying these solution types. We further find that inferential solutions exhibit low complexity bias, which we hypothesize is a key factor enabling them to learn individual mappings for single anchors. Building upon the understanding of these mechanisms, we can predict the learning behavior of models with different initialization scales when faced with data of varying complexity. Our findings provide valuable insights into the role of initialization scale in shaping the type of solution learned by transformers and their ability to learn and generalize compositional tasks.
Adaptive Anchor Pairs Selection in a TDOA-based System Through Robot Localization Error Minimization
The following paper presents an adaptive anchor pairs selection method for ultra-wideband (UWB) Time Difference of Arrival (TDOA) based positioning systems. The method divides the area covered by the system into several zones and assigns them anchor pair sets. The pair sets are determined during calibration based on localization root mean square error (RMSE). The calibration assumes driving a mobile platform equipped with a LiDAR sensor and a UWB tag through the specified zones. The robot is localized separately based on a large set of different TDOA pairs and using a LiDAR, which acts as the reference. For each zone, the TDOA pairs set for which the registered RMSE is lowest is selected and used for localization in the routine system work. The proposed method has been tested with simulations and experiments. The results for both simulated static and experimental dynamic scenarios have proven that the adaptive selection of the anchor nodes leads to an increase in localization accuracy. In the experiment, the median trajectory error for a moving person localization was at a level of 25 cm.
- Europe > Poland > Masovia Province > Warsaw (0.05)
- Europe > Poland > Łódź Province > Łódź (0.05)
- Europe > Serbia > Central Serbia > Belgrade (0.04)
- (2 more...)
Deep Discriminative Latent Space for Clustering
Tzoreff, Elad, Kogan, Olga, Choukroun, Yoni
Clustering is one of the most fundamental tasks in data analysis and machine learning. It is central to many data-driven applications that aim to separate the data into groups with similar patterns. Moreover, clustering is a complex procedure that is affected significantly by the choice of the data representation method. Recent research has demonstrated encouraging clustering results by learning effectively these representations. In most of these works a deep auto-encoder is initially pre-trained to minimize a reconstruction loss, and then jointly optimized with clustering centroids in order to improve the clustering objective. Those works focus mainly on the clustering phase of the procedure, while not utilizing the potential benefit out of the initial phase. In this paper we propose to optimize an auto-encoder with respect to a discriminative pairwise loss function during the auto-encoder pre-training phase. We demonstrate the high accuracy obtained by the proposed method as well as its rapid convergence (e.g.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)